168 research outputs found
Regret-Minimization Algorithms for Multi-Agent Cooperative Learning Systems
A Multi-Agent Cooperative Learning (MACL) system is an artificial
intelligence (AI) system where multiple learning agents work together to
complete a common task. Recent empirical success of MACL systems in various
domains (e.g. traffic control, cloud computing, robotics) has sparked active
research into the design and analysis of MACL systems for sequential decision
making problems. One important metric of the learning algorithm for decision
making problems is its regret, i.e. the difference between the highest
achievable reward and the actual reward that the algorithm gains. The design
and development of a MACL system with low-regret learning algorithms can create
huge economic values. In this thesis, I analyze MACL systems for different
sequential decision making problems. Concretely, the Chapter 3 and 4
investigate the cooperative multi-agent multi-armed bandit problems, with
full-information or bandit feedback, in which multiple learning agents can
exchange their information through a communication network and the agents can
only observe the rewards of the actions they choose. Chapter 5 considers the
communication-regret trade-off for online convex optimization in the
distributed setting. Chapter 6 discusses how to form high-productive teams for
agents based on their unknown but fixed types using adaptive incremental
matchings. For the above problems, I present the regret lower bounds for
feasible learning algorithms and provide the efficient algorithms to achieve
this bound. The regret bounds I present in Chapter 3, 4 and 5 quantify how the
regret depends on the connectivity of the communication network and the
communication delay, thus giving useful guidance on design of the communication
protocol in MACL systemsComment: Thesis submitted to London School of Economics and Political Science
for PhD in Statistic
On Regret-optimal Cooperative Nonstochastic Multi-armed Bandits
We consider the nonstochastic multi-agent multi-armed bandit problem with
agents collaborating via a communication network with delays. We show a lower
bound for individual regret of all agents. We show that with suitable
regularizers and communication protocols, a collaborative multi-agent
\emph{follow-the-regularized-leader} (FTRL) algorithm has an individual regret
upper bound that matches the lower bound up to a constant factor when the
number of arms is large enough relative to degrees of agents in the
communication graph. We also show that an FTRL algorithm with a suitable
regularizer is regret optimal with respect to the scaling with the edge-delay
parameter. We present numerical experiments validating our theoretical results
and demonstrate cases when our algorithms outperform previously proposed
algorithms.Comment: Published in AAMAS 202
Doubly Adversarial Federated Bandits
We study a new non-stochastic federated multi-armed bandit problem with
multiple agents collaborating via a communication network. The losses of the
arms are assigned by an oblivious adversary that specifies the loss of each arm
not only for each time step but also for each agent, which we call ``doubly
adversarial". In this setting, different agents may choose the same arm in the
same time step but observe different feedback. The goal of each agent is to
find a globally best arm in hindsight that has the lowest cumulative loss
averaged over all agents, which necessities the communication among agents. We
provide regret lower bounds for any federated bandit algorithm under different
settings, when agents have access to full-information feedback, or the bandit
feedback. For the bandit feedback setting, we propose a near-optimal federated
bandit algorithm called FEDEXP3. Our algorithm gives a positive answer to an
open question proposed in Cesa-Bianchi et al. (2016): FEDEXP3 can guarantee a
sub-linear regret without exchanging sequences of selected arm identities or
loss sequences among agents. We also provide numerical evaluations of our
algorithm to validate our theoretical results and demonstrate its effectiveness
on synthetic and real-world dataset
Regret-minimization algorithms for multi-agent cooperative learning systems
A Multi-Agent Cooperative Learning (MACL) system is an artificial intelligence (AI) system where multiple learning agents work together to complete a common task. Recent empirical success of MACL systems in various domains (e.g. traffic control, cloud computing, robotics) has sparked active research into the design and analysis of MACL systems for sequential decision making problems. One important metric of the learning algorithm for decision making problems is its regret, i.e. the difference between the highest achievable reward and the actual reward that the algorithm gains. The design and development of a MACL system with low-regret learning algorithms can create huge economic values. In this thesis, I analyze MACL systems for different sequential decision making problems. Concretely, the Chapter 3 and 4 investigate the cooperative multiagent multi-armed bandit problems, with full-information or bandit feedback, in which multiple learning agents can exchange their information through a communication network and the agents can only observe the rewards of the actions they choose. Chapter 5 considers the communication-regret trade-off for online convex optimization in the distributed setting. Chapter 6 discusses how to form high-productive teams for agents based on their unknown but fixed types using adaptive incremental matchings. For the above problems, I present the regret lower bounds for feasible learning algorithms and provide the efficient algorithms to achieve this bound. The regret bounds I present in Chapter 3, 4 and 5 quantify how the regret depends on the connectivity of the communication network and the communication delay, thus giving useful guidance on design of the communication protocol in MACL systems
Pair density wave, unconventional superconductivity, and non-Fermi liquid quantum critical phase in frustrated Kondo lattice
Motivated by the recent discovery of an intermediate quantum critical phase
between the antiferromagnetic order and the Fermi liquid in the frustrated
Kondo lattice CePdAl, we study here a Kondo-Heisenberg chain with frustrated
- XXZ interactions among local spins using the density matrix
renormalization group method. Our simulations reveal a global phase diagram
with rich ground states including the antiferromagnetic order, the
valence-bond-solid and bond-order-wave orders, the pair density wave state, the
uniform superconducting state, and the Luttinger liquid state. We show that
both the pair density wave and uniform superconductivity belong to the family
of Luther-Emery liquids and may arise from pair instability of an intermediate
quantum critical phase with medium Fermi volume in the presence of strong
quantum fluctuations, while the Luttinger liquid has a large Fermi volume. This
suggests a deep connection between the pair density wave, the unconventional
superconductivity, and the non-Fermi liquid quantum critical phase.Comment: 10 pages, 9 figure
Pure exploration and regret minimization in matching bandits
Finding an optimal matching in a weighted graph is a standard combinatorial problem. We consider its semi-bandit version where either a pair or a full matching is sampled sequentially. We prove that it is possible to leverage a rank-1 assumption on the adjacency matrix to reduce the sample complexity and the regret of off-the-shelf algorithms up to reaching a linear dependency in the number of vertices (up to poly log terms)
Lightweight object detection algorithm based on YOLOv5 for unmanned surface vehicles
Visual detection technology is essential for an unmanned surface vehicle (USV) to perceive the surrounding environment; it can determine the spatial position and category of the object, which provides important environmental information for path planning and collision prevention of the USV. During a close-in reconnaissance mission, it is necessary for a USV to swiftly navigate in a complex maritime environment. Therefore, an object detection algorithm used in USVs should have high detection s peed and accuracy. In this paper, a YOLOv5 lightweight object detection algorithm using a Ghost module and Transformer is proposed for USVs. Firstly, in the backbone network, the original convolution operation in YOLOv5 is upgraded by convolution stacking with depth-wise convolution in the Ghost module. Secondly, to exalt feature extraction without deepening the network depth, we propose integrating the Transformer at the end of the backbone network and Feature Pyramid Network structure in the YOLOv5, which can improve the ability of feature expression. Lastly, the proposed algorithm and six other deep learning algorithms were tested on ship datasets. The results show that the average accuracy of the proposed algorithm is higher than that of the other six algorithms. In particular, in comparison with the original YOLOv5 model, the model size of the proposed algorithm is reduced to 12.24 M, the frames per second reached 138, the detection accuracy was improved by 1.3%, and the mean of average precision (0.5) reached 96.6% (from 95.3%). In the verification experiment, the proposed algorithm was tested on the ship video collected by the “JiuHang 750” USV under different marine environments. The test results show that the proposed algorithm has a significantly improved detection accuracy compared with other lightweight detection algorithms
- …